{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Getting started with Cross-region Inference in Amazon Bedrock\n",
    "\n",
    "Inference profiles via Amazon Bedrock allow you to perform cross-region inference without the need to setup anything in a fully-managed, secure and private manner. Inference profiles is a managed feature built on Amazon Bedrock, offering generative AI application builders a seamless solution for managing traffic bursts and ensuring optimal availability and performance. By leveraging this feature, builders no longer have to spend time and effort building complex resiliency structure. Instead, inference profiles intelligently routes traffic across multiple opted-in regions, automatically adapting to peak utilization surges. Any inferences carried out through inference profiles will dynamically utilize on-demand capacity available from any of the configured regions, maximizing availability and throughput.\n",
    "\n",
    "## Key Features & Benefits\n",
    "\n",
    "Some of the key features include:\n",
    "\n",
    "* Access to capacity in different regions allowing generative AI workloads to scale with demand.\n",
    "* Access to region-agnostic models to achieve higher throughput.\n",
    "* Become resilient to any traffic bursts.\n",
    "* Ability to select between the pre-defined region sets suiting application needs.\n",
    "* Compatible with existing APIs.\n",
    "* No additional cost.\n",
    "* Same model pricing as the source region.\n",
    "\n",
    "The end-user experience is not impacted as the model behind cross-region inference remains the same, and builders can focus on writing logic for a differentiated application. \n",
    "\n",
    "Bedrock is the easiest way to build and scale generative AI applications. Cross region inference further enhances the usability."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Start by installing the dependencies to ensure we have a recent version\n",
    "#!pip install --quiet --upgrade --force-reinstall boto3 botocore awscli\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Helper methods \n",
    "\n",
    "1. To allow assume role capability if needed\n",
    "2. Leverage the profile set up for boto3 client"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import warnings\n",
    "\n",
    "from io import StringIO\n",
    "import sys\n",
    "import textwrap\n",
    "import os\n",
    "from typing import Optional\n",
    "\n",
    "# External Dependencies:\n",
    "import boto3\n",
    "from botocore.config import Config\n",
    "\n",
    "warnings.filterwarnings('ignore')\n",
    "\n",
    "def print_ww(*args, width: int = 100, **kwargs):\n",
    "    \"\"\"Like print(), but wraps output to `width` characters (default 100)\"\"\"\n",
    "    buffer = StringIO()\n",
    "    try:\n",
    "        _stdout = sys.stdout\n",
    "        sys.stdout = buffer\n",
    "        print(*args, **kwargs)\n",
    "        output = buffer.getvalue()\n",
    "    finally:\n",
    "        sys.stdout = _stdout\n",
    "    for line in output.splitlines():\n",
    "        print(\"\\n\".join(textwrap.wrap(line, width=width)))\n",
    "        \n",
    "\n",
    "\n",
    "\n",
    "def get_boto_client(\n",
    "    assumed_role: Optional[str] = None,\n",
    "    region: Optional[str] = None,\n",
    "    runtime: Optional[bool] = True,\n",
    "    service_name: Optional[str] = None,\n",
    "):\n",
    "    \"\"\"Create a boto3 client for Amazon Bedrock, with optional configuration overrides\n",
    "\n",
    "    Parameters\n",
    "    ----------\n",
    "    assumed_role :\n",
    "        Optional ARN of an AWS IAM role to assume for calling the Bedrock service. If not\n",
    "        specified, the current active credentials will be used.\n",
    "    region :\n",
    "        Optional name of the AWS Region in which the service should be called (e.g. \"us-east-1\").\n",
    "        If not specified, AWS_REGION or AWS_DEFAULT_REGION environment variable will be used.\n",
    "    runtime :\n",
    "        Optional choice of getting different client to perform operations with the Amazon Bedrock service.\n",
    "    \"\"\"\n",
    "    if region is None:\n",
    "        target_region = os.environ.get(\"AWS_REGION\", os.environ.get(\"AWS_DEFAULT_REGION\"))\n",
    "    else:\n",
    "        target_region = region\n",
    "\n",
    "    print(f\"Create new client\\n  Using region: {target_region}\")\n",
    "    session_kwargs = {\"region_name\": target_region}\n",
    "    client_kwargs = {**session_kwargs}\n",
    "\n",
    "    profile_name = os.environ.get(\"AWS_PROFILE\")\n",
    "    if profile_name:\n",
    "        print(f\"  Using profile: {profile_name}\")\n",
    "        session_kwargs[\"profile_name\"] = profile_name\n",
    "\n",
    "    retry_config = Config(\n",
    "        region_name=target_region,\n",
    "        retries={\n",
    "            \"max_attempts\": 10,\n",
    "            \"mode\": \"standard\",\n",
    "        },\n",
    "    )\n",
    "    session = boto3.Session(**session_kwargs)\n",
    "\n",
    "    if assumed_role:\n",
    "        print(f\"  Using role: {assumed_role}\", end='')\n",
    "        sts = session.client(\"sts\")\n",
    "        response = sts.assume_role(\n",
    "            RoleArn=str(assumed_role),\n",
    "            RoleSessionName=\"langchain-llm-1\"\n",
    "        )\n",
    "        print(\" ... successful!\")\n",
    "        client_kwargs[\"aws_access_key_id\"] = response[\"Credentials\"][\"AccessKeyId\"]\n",
    "        client_kwargs[\"aws_secret_access_key\"] = response[\"Credentials\"][\"SecretAccessKey\"]\n",
    "        client_kwargs[\"aws_session_token\"] = response[\"Credentials\"][\"SessionToken\"]\n",
    "\n",
    "    if not service_name:\n",
    "        if runtime:\n",
    "            service_name='bedrock-runtime'\n",
    "        else:\n",
    "            service_name='bedrock'\n",
    "\n",
    "    bedrock_client = session.client(\n",
    "        service_name=service_name,\n",
    "        config=retry_config,\n",
    "        **client_kwargs\n",
    "    )\n",
    "\n",
    "    print(\"boto3 Bedrock client successfully created!\")\n",
    "    print(bedrock_client._endpoint)\n",
    "    return bedrock_client"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import boto3\n",
    "print(boto3.__version__)\n",
    "\n",
    "os.environ[\"AWS_PROFILE\"] = '<replace with your profile>'\n",
    "boto3_client =  get_boto_client(region='us-east-1', runtime=False) # switch to the region of your choice\n",
    "boto3_client.list_foundation_models()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "🔴 IMPORTANT❗🔴\n",
    "\n",
    "*Note:* <span style=\"color:red\">This notebook sample uses `us-east-1` region and the inference profiles available there as example.\n",
    "Change the `region_name` and inference profile ids in the cells below depending on which region you are in and which inference profiles are available.</span>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Make sure you have AWS credentials or AWS profile setup before running this cell\n",
    "\n",
    "#boto3_client =  get_bedrock_client(region='us-east-1', runtime=False)\n",
    "\n",
    "region_name = 'us-east-1' # switch to the region of your choice\n",
    "account_id = get_boto_client(service_name='sts', region=region_name).get_caller_identity().get('Account')\n",
    "bedrock_client = get_boto_client(service_name='bedrock', region=region_name)\n",
    "bedrock_runtime = get_boto_client(service_name='bedrock-runtime', region=region_name)\n",
    "\n",
    "print(account_id, bedrock_client, bedrock_client)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Understand Inference Profiles\n",
    "\n",
    "The cross-region inference feature comes with two additional APIs in the Bedrock client SDK:\n",
    "\n",
    "* `list_inference_profiles()` -> This API tells you all the models that are configured behind Inference Profiles for you.\n",
    "* `get_inference_profile(inferenceProfileIdentifier)` -> This API give you specific details of a certain Inference Profile."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "bedrock_client.list_inference_profiles()['inferenceProfileSummaries']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It is important to note that cross-region inference offers two forms:\n",
    "\n",
    "**Foundation model in source region**\n",
    "\n",
    "In this mode a Inference Profile is only configured for a model which exists in the source region. For such model(s) a failover mechanism would exist allowing requests to be re-routed to fulfilment regions configured in Inference Profile. By default the request will go to source region resources and only if the region is busy or you hit your quota limit, the request is routed to another region. In order to determine, which region the request should go to Amazon Bedrock intelligently checks in real-time which region has redundant capacity available. The on-demand principle still applies if none of the region has capacity to handle the request, then it will be throttled.\n",
    "\n",
    "In this mode, If the source region doesn't have a certain model available, you will not be able to access it in a fulfilment region via cross-region inference feature.\n",
    "\n",
    "**Available via Inference Profiles**\n",
    "\n",
    "Select list of models are made available via cross-region inference, where Amazon Bedrock abstracts away the regional details and manages the hosting and inference routing automatically. Such foundation models then exist across the pre-defined region sets and the builder can build applications agnostic of setting up a region. This allows reliable throughput, access to leading foundation models as well as scalability in terms of throughput."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "bedrock_client.get_inference_profile(\n",
    "    inferenceProfileIdentifier='us.anthropic.claude-3-5-sonnet-20240620-v1:0'\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With the `get_inference_profile` API you can observe the `status` if a Inference Profile is `ACTIVE` or not and also which regions are configured for inference routing."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Use Inference Profiles\n",
    "\n",
    "An Inference Profile is used in the same way as a foundation model using the `modelId` or `arn` of the model. Inference profile also comes with it's own id and arn, where the difference is in the prefix. For the inference profile you can expect a regional prefix such as `us.` or `eu.` behind the model id for it to be recognized as an inference profile. Also in the `arn`, you can find the difference from `:<region>::foundation-model/<model-id>` to `:<region>::inference-profile/<region-set-prefix>.<model-id>`\n",
    "\n",
    "You can use both the `arn` and the `modelId` with `Converse` API whereas only the `modelId` with `InvokeModel` API"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Converse API\n",
    "\n",
    "Amazon Bedrock now supports a unified messaging API for seamless application building experience. Read about all the models supported via this API [here](https://docs.aws.amazon.com/bedrock/latest/userguide/conversation-inference.html#conversation-inference-supported-models-features).\n",
    "\n",
    "Let's send a request to both the foundation model as well as the Inference Profile to observe change."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from time import time\n",
    "system_prompt = \"You are an expert on AWS services and always provide correct and concise answers.\"\n",
    "input_message = \"Should I be storing documents in Amazon S3 or EFS for cost effective applications?\"\n",
    "modelId = ('Foundation Model', 'anthropic.claude-3-haiku-20240307-v1:0')\n",
    "inferenceProfileId = ('Inference Profile', 'us.anthropic.claude-3-haiku-20240307-v1:0')\n",
    "\n",
    "for inference_type, id in [modelId, inferenceProfileId]:\n",
    "    start = time()\n",
    "    response = bedrock_runtime.converse(\n",
    "        modelId=id,\n",
    "        system=[{\"text\": system_prompt}],\n",
    "        messages=[{\n",
    "            \"role\": \"user\",\n",
    "            \"content\": [{\"text\": input_message}]\n",
    "        }]\n",
    "    )\n",
    "    end = time()\n",
    "    print(f\"::{inference_type}::Response time: {int(end-start)} second(s)\")\n",
    "    print(f\"::model:id:{id}::Response time: {int(end-start)} second(s)\")\n",
    "    print(f\"::{inference_type}::Response output: {response['output']['message']['content']}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can observe that using the same API and only the change in model ID you can expect similar behavior.\n",
    "\n",
    "It is also possible to use a full ARN in place of the model ID:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "inference_profile_id = \"us.anthropic.claude-3-sonnet-20240229-v1:0\"\n",
    "inference_profile_arn = f\"arn:aws:bedrock:{region_name}:{account_id}:inference-profile/{inference_profile_id}\"\n",
    "print(inference_profile_arn)\n",
    "\n",
    "response = bedrock_runtime.converse(\n",
    "    modelId=inference_profile_arn,\n",
    "    system=[{\"text\": system_prompt}],\n",
    "    messages=[{\n",
    "        \"role\": \"user\",\n",
    "        \"content\": [{\"text\": input_message}]\n",
    "    }]\n",
    ")\n",
    "print(response['output']['message']['content'][0]['text'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### InvokeModel API\n",
    "\n",
    "Most of the generative AI applications in production are already built on top of `InvokeModel` API, even the third-party tools also use this API for using the models. The cross-region inference feature is also compatible with this API. Where the Converse API only supports select models, all models available in Amazon Bedrock can leverage InvokeModel API.\n",
    "\n",
    "Let's send a request via both the foundation model id as well as the inference profile id to observe change via this API."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "body = json.dumps({\n",
    "    \"anthropic_version\": \"bedrock-2023-05-31\",\n",
    "    \"max_tokens\": 1024,\n",
    "    \"temperature\": 0.1,\n",
    "    \"top_p\": 0.9,\n",
    "    \"system\": system_prompt,\n",
    "    \"messages\": [\n",
    "        {\n",
    "            \"role\": \"user\",\n",
    "            \"content\": [\n",
    "                {\n",
    "                    \"type\": \"text\",\n",
    "                    \"text\": f\"{input_message}\",\n",
    "                }\n",
    "            ]\n",
    "        }\n",
    "    ]\n",
    "})\n",
    "accept = 'application/json'\n",
    "contentType = 'application/json'\n",
    "modelId = ('Foundation Model', 'anthropic.claude-3-sonnet-20240229-v1:0')\n",
    "inferenceProfileId = ('Inference Profile', 'us.anthropic.claude-3-sonnet-20240229-v1:0')\n",
    "for inference_type, id in [modelId, inferenceProfileId]:\n",
    "    start = time()\n",
    "    response = bedrock_runtime.invoke_model(body=body, modelId=id, accept=accept, contentType=contentType)\n",
    "    end = time()\n",
    "    response_body = json.loads(response.get('body').read())\n",
    "    print(f\"::{inference_type}::Response time: {int(end-start)} second(s)\")\n",
    "    print(f\"::{inference_type}::Response output: {response_body['content'][0]['text']}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### LangChain\n",
    "\n",
    "Below you can find integration on how to use the new cross-region inference feature with one of the popular open-source frameworks [LangChain](https://python.langchain.com/v0.2/docs/integrations/platforms/aws/)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install --quiet langchain_aws langchain_community"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[LangChain](https://python.langchain.com/v0.2/docs/introduction/) with the help of integrations implements [`langchain-aws`](https://github.com/langchain-ai/langchain-aws) which provides support  `Converse` API via [`ChatBedrockConverse`](https://python.langchain.com/v0.2/docs/integrations/chat/bedrock/#beta-bedrock-converse-api). This allows you to use the latest models such as Anthropic Claude 3 Sonnet via this implementation. Below you can see an example of that."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_aws import ChatBedrockConverse\n",
    "\n",
    "messages = [\n",
    "    (\n",
    "        \"system\",\n",
    "        \"You are a helpful assistant that shapes sentences into poetic form. Translate the user sentence.\",\n",
    "    ),\n",
    "    (\"human\", \"I love programming.\"),\n",
    "]\n",
    "\n",
    "llm = ChatBedrockConverse(\n",
    "    model='us.anthropic.claude-3-sonnet-20240229-v1:0',\n",
    "    temperature=0,\n",
    "    max_tokens=None,\n",
    "    client=bedrock_runtime,\n",
    ")\n",
    "\n",
    "print(llm.invoke(messages).content)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Use the ChatBedrock with Model id as the inference profile model id instead of the ARN\n",
    "\n",
    "**Option 1**\n",
    "\n",
    "1. We set the converse api flag to be true and that invokes the ChatBedrockConverse class which will invoke the model using the converse api class\n",
    "2. `Region` needs to be explictly passed in for this to work"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_aws import ChatBedrock\n",
    "\n",
    "messages = [\n",
    "    (\n",
    "        \"system\",\n",
    "        \"You are a helpful assistant that shapes sentences into poetic form. Translate the user sentence.\",\n",
    "    ),\n",
    "    (\"human\", \"I love programming.\"),\n",
    "]\n",
    "model_parameter = {\"temperature\": 0.0, \"top_p\": .5, \"max_tokens_to_sample\": 200}\n",
    "llm = ChatBedrock(\n",
    "    model_id='us.anthropic.claude-3-sonnet-20240229-v1:0',\n",
    "    model_kwargs=model_parameter, \n",
    "    beta_use_converse_api=True,  \n",
    "    client=bedrock_runtime,\n",
    "    region_name='us-east-1'\n",
    ")\n",
    "print(llm.invoke(messages).content)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Use Model parameters\n",
    "\n",
    "Pass in the `Model inference parameters` at run time for model invocation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_aws import ChatBedrock\n",
    "\n",
    "messages = [\n",
    "    (\n",
    "        \"system\",\n",
    "        \"You are a helpful assistant that shapes sentences into poetic form. Generate a potery from this cue\",\n",
    "    ),\n",
    "    (\"human\", \"I love programming because......\"),\n",
    "]\n",
    "#model_parameter = {\"temperature\": 0.1, \"top_p\": .5, \"max_tokens\": 1000  } #\"max_tokens_to_sample\": 200}\n",
    "llm = ChatBedrock(\n",
    "    model_id='us.anthropic.claude-3-sonnet-20240229-v1:0',\n",
    "    #model_kwargs=model_parameter,\n",
    "    beta_use_converse_api=True,\n",
    "    client=bedrock_runtime,\n",
    "    region_name='us-east-1'\n",
    ")\n",
    "# - these will take precedence even if we have similiar params when creating the class\n",
    "runtime_parameter = {\"temperature\": 0.9,  \"max_tokens\": 100 }\n",
    "print(llm.invoke(messages, **runtime_parameter).content)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Streaming with LangChain\n",
    "\n",
    "Demostrate the streaming capabilities with LangChain. We also demonstrate that run time parameters take precedence. Say if you have temperature when creating the ChatBedrock class and also during the invocation, it will use the invocation values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_aws import ChatBedrock\n",
    "\n",
    "messages = [\n",
    "    (\n",
    "        \"system\",\n",
    "        \"You are a helpful assistant that shapes sentences into poetic form. Generate a potery from this cue\",\n",
    "    ),\n",
    "    (\"human\", \"I love programming because......\"),\n",
    "]\n",
    "model_parameter = {\"temperature\": 0.1, \"top_p\": .5, \"max_tokens\": 1000  } #\"max_tokens_to_sample\": 200}\n",
    "llm = ChatBedrock(\n",
    "    model_id='us.anthropic.claude-3-sonnet-20240229-v1:0',\n",
    "    model_kwargs=model_parameter,\n",
    "    beta_use_converse_api=True,\n",
    "    client=bedrock_runtime,\n",
    "    region_name='us-east-1'\n",
    ")\n",
    "\n",
    "# These would take precedence \n",
    "runtime_parameter = {\"temperature\": 0.9,  \"max_tokens\": 100 }\n",
    "response = llm.stream(messages, **runtime_parameter)\n",
    "\n",
    "# Extract and print the response text in real-time.\n",
    "for chunk in response:\n",
    "    if chunk and len(chunk.content) > 0:\n",
    "        sys.stdout.write(chunk.content[0].get('text',\"\"))\n",
    "        sys.stdout.flush()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Option 2 where we have the model provider**\n",
    "\n",
    "This does not use the converse api behind the scenes and calls the invoke api"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_aws import ChatBedrock\n",
    "\n",
    "messages = [\n",
    "    (\n",
    "        \"system\",\n",
    "        \"You are a helpful assistant that shapes sentences into poetic form. Translate the user sentence.\",\n",
    "    ),\n",
    "    (\"human\", \"I love programming.\"),\n",
    "]\n",
    "\n",
    "model_parameter = {\"temperature\": 0.0, \"top_p\": .5}\n",
    "llm = ChatBedrock(\n",
    "    provider=\"anthropic\",\n",
    "    model_id=\"us.anthropic.claude-3-haiku-20240307-v1:0\",\n",
    "    model_kwargs=model_parameter, \n",
    "    client=bedrock_runtime,\n",
    ")\n",
    "print(llm.invoke(messages).content)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Using the invoke and the prescribed message format for Claude models"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "messages = json.dumps({\n",
    "    \"anthropic_version\": \"bedrock-2023-05-31\",\n",
    "    \"max_tokens\": 1024,\n",
    "    \"temperature\": 0.1,\n",
    "    \"top_p\": 0.9,\n",
    "    \"system\": 'You are a helpful assistant that shapes sentences into poetic form. Translate the user sentence.',\n",
    "    \"messages\": [\n",
    "        {\n",
    "            \"role\": \"user\",\n",
    "            \"content\": [\n",
    "                {\n",
    "                    \"type\": \"text\",\n",
    "                    \"text\": \"I love programming.\",\n",
    "                }\n",
    "            ]\n",
    "        }\n",
    "    ]\n",
    "})\n",
    "model_parameter = {\"temperature\": 0.0, \"top_p\": .5}\n",
    "llm = ChatBedrock(\n",
    "    provider=\"anthropic\",\n",
    "    model_id=\"us.anthropic.claude-3-haiku-20240307-v1:0\",\n",
    "    model_kwargs=model_parameter, \n",
    "    client=bedrock_runtime,\n",
    ")\n",
    "print(llm.invoke(messages).content)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Monitoring, Logging, and Metrics\n",
    "\n",
    "Using cross-region inference might route your request to a different region, based on the selected inference profile.\n",
    "\n",
    "If a request gets re-routed, the [Bedrock model invocation log](https://docs.aws.amazon.com/bedrock/latest/userguide/model-invocation-logging.html) will mention the used region in the `inferenceRegion` element of the JSON log entry.\n",
    "\n",
    "The below code snippet will enable the model invocation logging to a CloudWatch log group and then create a CloudWatch metric filter to select and count all model invocations that get routed to a fulfilment region. You can adapt this code to build a metric for a specific target region and to monitor latency based on region.\n",
    "\n",
    "This code snippet creates a new IAM role with appropriate permissions to enable the model invocation log delivered into CloudWatch Logs:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import json\n",
    "\n",
    "# see https://docs.aws.amazon.com/bedrock/latest/userguide/model-invocation-logging.html\n",
    "\n",
    "log_group_name = 'bedrock-model-invocation-logging'\n",
    "role_name = 'AmazonBedrockModelInvocationCWDeliveryRole'\n",
    "assume_role_policy_doc = {\n",
    "    \"Version\": \"2012-10-17\",\n",
    "    \"Statement\": [\n",
    "        {\n",
    "            \"Effect\": \"Allow\",\n",
    "            \"Principal\": {\n",
    "                \"Service\": \"bedrock.amazonaws.com\"\n",
    "            },\n",
    "            \"Action\": \"sts:AssumeRole\",\n",
    "            \"Condition\": {\n",
    "                \"StringEquals\": {\n",
    "                    \"aws:SourceAccount\": account_id,\n",
    "                },\n",
    "                \"ArnLike\": {\n",
    "                    \"aws:SourceArn\": f\"arn:aws:bedrock:{region_name}:{account_id}:*\",\n",
    "                }\n",
    "            }\n",
    "        }\n",
    "    ]\n",
    "}\n",
    "iam_client = boto3.client('iam', region_name=region_name)\n",
    "try:\n",
    "    response = iam_client.create_role(\n",
    "        RoleName=role_name,\n",
    "        AssumeRolePolicyDocument=json.dumps(assume_role_policy_doc),\n",
    "    )\n",
    "except iam_client.exceptions.EntityAlreadyExistsException:\n",
    "    pass\n",
    "policy_doc = {\n",
    "    \"Version\": \"2012-10-17\",\n",
    "    \"Statement\": [\n",
    "        {\n",
    "            \"Effect\": \"Allow\",\n",
    "            \"Action\": [\n",
    "                \"logs:CreateLogStream\",\n",
    "                \"logs:PutLogEvents\",\n",
    "            ],\n",
    "            \"Resource\": f\"arn:aws:logs:{region_name}:{account_id}:log-group:{log_group_name}:log-stream:aws/bedrock/modelinvocations\",\n",
    "        }\n",
    "    ]\n",
    "}\n",
    "iam_client.put_role_policy(\n",
    "    RoleName=role_name,\n",
    "    PolicyName=role_name,\n",
    "    PolicyDocument=json.dumps(policy_doc),\n",
    ")\n",
    "\n",
    "logs_client = boto3.client('logs', region_name=region_name)\n",
    "try:\n",
    "    logs_client.create_log_group(\n",
    "        logGroupName=log_group_name,\n",
    "    )\n",
    "except logs_client.exceptions.ResourceAlreadyExistsException:\n",
    "    pass\n",
    "logs_client.put_retention_policy(\n",
    "    logGroupName=log_group_name,\n",
    "    retentionInDays=365\n",
    ")\n",
    "\n",
    "response = bedrock_client.put_model_invocation_logging_configuration(\n",
    "    loggingConfig={\n",
    "        'cloudWatchConfig': {\n",
    "            'logGroupName': log_group_name,\n",
    "            'roleArn': f\"arn:aws:iam::{account_id}:role/{role_name}\",\n",
    "        },\n",
    "        'imageDataDeliveryEnabled': False,\n",
    "        'imageDataDeliveryEnabled': False,\n",
    "        'embeddingDataDeliveryEnabled': False,\n",
    "    }\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With the Model Invocation Log enabled and configured for CloudWatch Logs, you can now create a CloudWatch metric filter from the ingested JSON log entries by filtering for `inferenceRegion` to match / mis-match against your source or target region:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "logs_client.put_metric_filter(\n",
    "    logGroupName=log_group_name,\n",
    "    filterName='bedrock_cross_region_inference_rerouted',\n",
    "    filterPattern=f'{{ $.inferenceRegion != \"{region_name}\" }}',\n",
    "    metricTransformations=[\n",
    "        {\n",
    "            'metricNamespace': 'Custom',\n",
    "            'metricName': 'bedrock_cross_region_inference_rerouted',\n",
    "            'metricValue': '1',\n",
    "            'defaultValue': 0,\n",
    "            'unit': 'Count',\n",
    "        }\n",
    "    ]\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conclusion\n",
    "\n",
    "Cross-region inference provides the ability to manage bursts and spiky traffic patterns across a variety of generative AI workloads and disparate request shapes.With this feature you can easily scale your workloads in production without the need of heavy-lifting, lengthy migrations and overhead of infrastructure management. Amazon Bedrock handles the routing securely, reliably and in a transparent manner while giving you the control you need.\n",
    "\n",
    "### Key considerations\n",
    "While building or migrating applications to use of cross-region inference, it is important to keep in mind the following:\n",
    "- **Latency** - If your application is latency sensitive, it is advised to properly test the use of cross-region inference prior to fully relying on it since the routing to different regions could lead to higher latency numbers thus impacting your application behavior.\n",
    "- **Compliance** - Cross-region inference comes with pre-defined region sets, if these sets contain a region where you can't operate or goes against your policy, then it is advised not to use Inference Profiles, instead utilize Foundation Models directly.\n",
    "- **Determinism** - If you need to have control over where your requests are (or will be) re-routed (other than source region), it is better to consider just rely on Foundation Model directly. Also the model exclusive to cross-region inference exclusive models should be opted for carefully.\n",
    "- **Variety** - Inference Profiles do not provide access to multiple different models from different regions, it either acts as a failover mechanism for models in source region or provides Inference Profile exclusive models where the construct of a region is abstracted away. If your business demands to span across variety of foundation models from multiple regions, it is wise to consider building your own architecture via VPC Peering/Transit Gateway or other architectural patterns.\n",
    "\n",
    "### Cleanup\n",
    "\n",
    "If you ran the **Monitoring, Logging, and Metrics** code, consider disabling the Model Invocation Log to avoid incuring associated cost."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "AGR",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
